首页> 外文OA文献 >SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples
【2h】

SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples

机译:从多个二倍体样品的低覆盖率测序数据进行SNP检测和基因分型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Reductions in the cost of sequencing have enabled whole-genome sequencing to identify sequence variants segregating in a population. An efficient approach is to sequence many samples at low coverage, then to combine data across samples to detect shared variants. Here, we present methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. For each population, we first collect SNP candidates based on independent sequence calls per site. We then use MARGARITA with genotype or phased haplotype data from the same samples to collect 20 ancestral recombination graphs (ARGs). We refine the posterior probability of SNP candidates by considering possible mutations at internal branches of the 40 marginal ancestral trees inferred from the 20 ARGs at the left and right flanking genotype sites. Using a population genetic prior distribution on tree-branch length and Bayesian inference, we determine a posterior probability of the SNP being real and also the most probable phased genotype call for each individual. We present experiments on both simulation data and real data from the 1000 Genomes Project to prove the applicability of the methods. We also explore the relative tradeoff between sequencing depth and the number of sequenced samples.
机译:测序成本的降低使全基因组测序能够鉴定人群中分离的序列变体。一种有效的方法是在低覆盖率下对许多样本进行排序,然后将样本之间的数据合并以检测共享变体。在这里,我们介绍利用共享单倍型(连锁不平衡)信息从低覆盖率测序数据中发现单核苷酸多态性(SNP)位点并对其进行基因分型的方法。对于每个人群,我们首先根据每个站点的独立序列调用收集SNP候选对象。然后,我们将MARGARITA与来自相同样品的基因型或分阶段单倍型数据一起使用,以收集20个祖先重组图(ARG)。通过考虑从左右两侧基因型位点的20个ARG推断出的40个边缘祖先树的内部分支处的可能突变,我们优化了SNP候选物的后验概率。使用关于树枝长度和贝叶斯推断的种群遗传先验分布,我们确定了SNP是真实的后验概率,并且还确定了每个个体最有可能的分期基因型。我们目前对来自1000个基因组计划的模拟数据和真实数据进行实验,以证明该方法的适用性。我们还探讨了测序深度和测序样品数量之间的相对权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号